AITopics | model-based offline reinforcement learning

Collaborating Authors

model-based offline reinforcement learning

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

One Risk to Rule Them All: A Risk-Sensitive Perspective on Model-Based Offline Reinforcement Learning

Neural Information Processing SystemsDec-27-2025, 05:28:23 GMT

Offline reinforcement learning (RL) is suitable for safety-critical domains where online exploration is not feasible. In such domains, decision-making should take into consideration the risk of catastrophic outcomes. In other words, decision-making should be . An additional challenge of offline RL is avoiding, i.e. ensuring that state-action pairs visited by the policy remain near those in the dataset. Previous offline RL algorithms that consider risk combine offline RL techniques (to avoid distributional shift), with risk-sensitive RL algorithms (to achieve risk-aversion).

model-based offline reinforcement learning, name change, risk-sensitive perspective, (8 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.85)

Add feedback

MOReL: Model-Based Offline Reinforcement Learning

Neural Information Processing SystemsDec-24-2025, 21:54:40 GMT

model-based offline reinforcement learning, morel, pessimistic mdp, (5 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.30)

Add feedback

Optimal Uniform OPE and Model-based Offline Reinforcement Learning in Time-Homogeneous, Reward-Free and Task-Agnostic Settings

Neural Information Processing SystemsDec-24-2025, 06:26:10 GMT

model-based offline reinforcement learning, optimal uniform ope, reward-free and task-agnostic, (4 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Model-Based Offline Reinforcement Learning with Pessimism-Modulated Dynamics Belief

Neural Information Processing SystemsDec-23-2025, 16:43:35 GMT

Model-based offline reinforcement learning (RL) aims to find highly rewarding policy, by leveraging a previously collected static dataset and a dynamics model. While the dynamics model learned through reuse of the static dataset, its generalization ability hopefully promotes policy learning if properly utilized. To that end, several works propose to quantify the uncertainty of predicted dynamics, and explicitly apply it to penalize reward. However, as the dynamics and the reward are intrinsically different factors in context of MDP, characterizing the impact of dynamics uncertainty through reward penalty may incur unexpected tradeoff between model utilization and risk avoidance. In this work, we instead maintain a belief distribution over dynamics, and evaluate/optimize policy through biased sampling from the belief.

model-based offline reinforcement learning, name change, pessimism-modulated dynamic belief, (4 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.42)

Add feedback

Constrained Latent Action Policies for Model-Based Offline Reinforcement Learning

Neural Information Processing SystemsMay-27-2025, 06:56:56 GMT

In offline reinforcement learning, a policy is learned using a static dataset in the absence of costly feedback from the environment. In contrast to the online setting, only using static datasets poses additional challenges, such as policies generating out-of-distribution samples. Model-based offline reinforcement learning methods try to overcome these by learning a model of the underlying dynamics of the environment and using it to guide policy search. It is beneficial but, with limited datasets, errors in the model and the issue of value overestimation among out-of-distribution states can worsen performance. Current model-based methods apply some notion of conservatism to the Bellman update, often implemented using uncertainty estimation derived from model ensembles.

artificial intelligence, machine learning, reinforcement learning, (5 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Review for NeurIPS paper: MOReL: Model-Based Offline Reinforcement Learning

Neural Information Processing SystemsFeb-8-2025, 10:51:12 GMT

Additional Feedback: Most of recent offline RL algorithms rely on policy regularization where the optimizing policy is prevented from deviating too much from the data-logging policy. Differently, MOReL does not directly rely on the data-logging policy but exploits pessimism to a model-based approach, providing another good direction for offline RL. However, it would be more natural to penalize more to more uncertain states. For example, one classical model-based RL algorithm (MBIE-EB) constructs an optimistic MDP that rewarding the uncertain regions by the bonus proportional to the 1/sqrt(N(s,a)) where N(s,a) is the visitation count. In contrast, but similarly to MBIE-EB, we may consider a pessimistic MDP that penalizes the uncertain regions by the penalty proportional to the 1/sqrt(N(s,a)). How is it justified to use alpha greater than zero for USAD? - It would be great to see how sensitive the performance of the algorithm with respect to kappa in the reward penalty and threshold in USAD.

model-based offline reinforcement learning, offline reinforcement learning, reinforcement learning, (11 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Review for NeurIPS paper: MOReL: Model-Based Offline Reinforcement Learning

Neural Information Processing SystemsFeb-8-2025, 10:51:05 GMT

All three reviewers have favourable opinion towards this paper. There are some minor questions or comments, but they can be addressed without requiring another round of reviewing. Therefore, I recommend acceptance of this work. I encourage the authors to incorporate the reviewers' comments and concerns as much as possible.

model-based offline reinforcement learning, morel, neurips paper, (1 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.40)

Add feedback

One Risk to Rule Them All: A Risk-Sensitive Perspective on Model-Based Offline Reinforcement Learning

Neural Information Processing SystemsJan-20-2025, 02:37:40 GMT

Offline reinforcement learning (RL) is suitable for safety-critical domains where online exploration is not feasible. In such domains, decision-making should take into consideration the risk of catastrophic outcomes. In other words, decision-making should be risk-averse. An additional challenge of offline RL is avoiding distributional shift, i.e. ensuring that state-action pairs visited by the policy remain near those in the dataset. Previous offline RL algorithms that consider risk combine offline RL techniques (to avoid distributional shift), with risk-sensitive RL algorithms (to achieve risk-aversion).

distributional shift, model-based offline reinforcement learning, risk-sensitive perspective, (6 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

MOReL: Model-Based Offline Reinforcement Learning

Neural Information Processing SystemsJan-16-2025, 07:41:36 GMT

In offline reinforcement learning (RL), the goal is to learn a highly rewarding policy based solely on a dataset of historical interactions with the environment. This serves as an extreme test for an agent's ability to effectively use historical data which is known to be critical for efficient RL. Prior work in offline RL has been confined almost exclusively to model-free RL approaches. This framework consists of two steps: (a) learning a pessimistic MDP using the offline dataset; (b) learning a near-optimal policy in this pessimistic MDP. The design of the pessimistic MDP is such that for any policy, the performance in the real environment is approximately lower-bounded by the performance in the pessimistic MDP.

model-based offline reinforcement learning, offline rl, pessimistic mdp, (2 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.64)

Add feedback

Optimal Uniform OPE and Model-based Offline Reinforcement Learning in Time-Homogeneous, Reward-Free and Task-Agnostic Settings

Neural Information Processing SystemsOct-11-2024, 00:06:15 GMT

This work studies the statistical limits of uniform convergence for offline policy evaluation (OPE) problems with model-based methods (for episodic MDP) and provides a unified framework towards optimal learning for several well-motivated offline tasks. Uniform OPE \sup_\Pi Q \pi-\hat{Q} \pi \epsilon is a stronger measure than the point-wise OPE and ensures offline learning when \Pi contains all policies (the global class). In this paper, we establish an \Omega(H 2 S/d_m\epsilon 2) lower bound (over model-based family) for the global uniform OPE and our main result establishes an upper bound of \tilde{O}(H 2/d_m\epsilon 2) for the \emph{local} uniform convergence that applies to all \emph{near-empirically optimal} policies for the MDPs with \emph{stationary} transition. Here d_m is the minimal marginal state-action probability. Critically, the highlight in achieving the optimal rate \tilde{O}(H 2/d_m\epsilon 2) is our design of \emph{singleton absorbing MDP}, which is a new sharp analysis tool that works with the model-based approach.

model-based offline reinforcement learning, optimal uniform ope, reward-free and task-agnostic, (5 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.40)

Add feedback